Get Page (Web Mining)
Synopsis
Gets a page via HTTP.Description
This operator sends a GET request via HTTP. The returned page is output as a document.
Output
- output
The output port.
Parameters
- urlThe URL from which should be read.
- random user agentChoose a user agent randomly from a set of 7000 user agents
- user agentThe user agent property.
- connection timeoutThe timeout (in ms) for the connection.
- read timeoutThe timeout (in ms) for reading from the URL.
- follow redirectsSpecifies, whether redirects should be followed.
- accept cookiesSpecifies, whether cookies should be accepted.
- cookie scopeSpecifies the scope of the cookies used
- request methodSpecifies the request method.
- query parametersThe query parameters as key/value pairs.
- request propertiesWith this parameter you can define all properties that are sent with the HTTP request to match the needs of your webservice.
- override encodingNormally, the encoding of the retrieved page is determined automatically. In some rare cases this does not work well or the server provides a wrong encoding string. In this case, you can enable this option to override the automatically detected encoding.
- encodingThe encoding used for reading or writing files.
- keep sensitive headersKeep "Authorization" and "Cookie" header during a redirect to a different domain or subdomain.